SenGen: Sentence Generating Neural Variational Topic Model
نویسندگان
چکیده
We present a new topic model that generates documents by sampling a topic for one whole sentence at a time, and generating the words in the sentence using an RNN decoder that is conditioned on the topic of the sentence. We argue that this novel formalism will help us not only visualize and model the topical discourse structure in a document better, but also potentially lead to more interpretable topics since we can now illustrate topics by sampling representative sentences instead of bag of words or phrases. We present a variational auto-encoder approach for learning in which we use a factorized variational encoder that independently models the posterior over topical mixture vectors of documents using a feedforward network, and the posterior over topic assignments to sentences using an RNN. Our preliminary experiments on two different datasets indicate early promise, but also expose many challenges that remain to be addressed.
منابع مشابه
Generating Sentences from a Continuous Space
The standard recurrent neural network language model (rnnlm) generates sentences one word at a time and does not work from an explicit global sentence representation. In this work, we introduce and study an rnn-based variational autoencoder generative model that incorporates distributed latent representations of entire sentences. This factorization allows it to explicitly model holistic propert...
متن کاملLearning Task Specific Distributed Paragraph Representations Using a 2-Tier Convolutional Neural Network
We introduce a type of two-tier convolutional neural network model for learning distributed paragraph representations for a special task (e.g. topic classification or sentiment classification). We decompose the paragraph semantics into three cascaded constitutes: word representation, sentence composition and document composition. Specifically, we learn distributed word representations by a cont...
متن کاملVariational Neural Machine Translation
Models of neural machine translation are often from a discriminative family of encoder-decoders that learn a conditional distribution of a target sentence given a source sentence. In this paper, we propose a variational model to learn this conditional distribution for neural machine translation: a variational encoder-decoder model that can be trained end-to-end. Different from the vanilla encod...
متن کاملSentence Level Recurrent Topic Model: Letting Topics Speak for Themselves
We propose Sentence Level Recurrent Topic Model (SLRTM), a new topic model that assumes the generation of each word within a sentence to depend on both the topic of the sentence and the whole history of its preceding words in the sentence. Different from conventional topic models that largely ignore the sequential order of words or their topic coherence, SLRTM gives full characterization to the...
متن کاملMulti-Document Summarization using Sentence-based Topic Models
Most of the existing multi-document summarization methods decompose the documents into sentences and work directly in the sentence space using a term-sentence matrix. However, the knowledge on the document side, i.e. the topics embedded in the documents, can help the context understanding and guide the sentence selection in the summarization procedure. In this paper, we propose a new Bayesian s...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1708.00308 شماره
صفحات -
تاریخ انتشار 2017